**Creating the Plava Gaming Platform Using the Spartan-3E**

|  |  |  |  |
| --- | --- | --- | --- |
| Ashton Snelgrove  University of Utah  1518 S. 1000 E.  Salt Lake City, UT 84105  801-819-6344  snelgrov@eng.utah.edu | Jacob Sanders  University of Utah  Address  Phone  E-mail | Matthew Steadman  University of Utah  4318 Arden Ct  Taylorsville, UT 84123  801-783-0989  steadman@eng.utah.edu | William Graham  University of Utah  737 E 700 S  SLC, UT 84111  801-620-0266  U0527893@utah.edu |
|  |  |  |  |

**ABSTRACT**

In this paper, we will describe the development of a generic console game platform, using the Xilinx Spartan-3E FPGA board, I/O devices, and other additional circuitry. We will then demonstrate a sample application of the platform, implementing a clone of the Nintendo Entertainment System game Duck Hunt.

**Categories and Subject Descriptors**

B.6.3 [**Verilog]**: Design and implementation through a Xilinx FPGA of a gaming platform – *Processor, GPU, digital sound reproduction*

**General Terms**

Design, Languages

**Keywords**

Spartan-3E, Processor Design, 24-bit color, GPU, Assembly language, Light gun

# INTRODUCTION

The intent of this project was to create Plava, a 16-bit generic gaming platform that can be used to run and play video games. Plava is the Croatian word for the color blue, and refers to the development team, which through the development cycle was referred to as Team Blue.

Plava is capable of generating 24-bit color images, can play up to ten sounds at the same time, and is compatible with the Super Nintendo Entertainment System game pad controller and the Nintendo Entertainment System Zapper light gun. Use of the light gun requires an external VGA to TV converter box that is capable of displaying the images generated by the GPU on a standard tube television.

Permission is granted to make digital or hard copies of this work for any educational or personal purposes provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice, all author names, and full citation on the first page. Written permission to otherwise copy or distribute this work must be obtained from any one of the authors, according to the contact information given.

# THE PLAVA PROCESSOR

The Plava uses a 16-bit processor built around a modified version of the CR-16 instruction set. The processor contains sixteen 16-bit registers and an additional 5-bit program state register (PSR).

## Instruction Set

The Plava instruction set is a binary (two argument) instruction set. In most instructions the first input is the destination of the data following the execution of the instruction. The second instruction is an argument to the operation. There are two types of most instructions, a register based operation where two registers are given and the result stored in the first, and an immediate type instruction where a scalar value can be processed in place of the secondary register.

There are a total of twelve basic arithmetic instructions, including addition, subtraction, and multiplication. Six of these are addition instructions. These include both register and immediate type instructions for standard addition, addition with carry, and addition that does not affect the PSR. There are 4 subtraction operations, again both register and immediate versions of a standard subtraction, and subtraction with carry. The other two basic arithmetic operations are a register and immediate type of multiply.

There are also register and immediate based bitwise and, or, and exclusive or operations comprising a total of six bitwise operations. Shifting operations are also implemented for register and immediate arguments, for both arithmetic and logical shifting. The value in the first register is shifted by the number stored in the second register or an immediate value. Left shifts are indicated by a positive shift amount, and right shifts by a negative shift amount.

For getting and setting data values there are several instructions. The move command allow you to move a value stored in a register or from an immediate value into another register. Since there is a limitation in the number of bits available for an immediate value there is an additional load upper immediate instruction which loads the immediate value into the upper 8-bits of the register rather than the lower 8 bits. The move immediate command will sign extend the immediate value. To move a full 16-bit immediate value into a register the move command and then the load command are required, with move immediate first, which only overwrites the upper 8-bits, leaving the lower 8-bits alone.   
There is also a load and store command which enables you to load or store values from the system’s memory.

Two additional instructions were added to the instruction set, test and not. Test performs a bitwise and for the purpose of setting the PSR flags, but does not write the result back to register. Not implements a unary bitwise not.

The PSR was used to store information on the result of certain instructions. The five flag bits store whether the previous instruction resulted in a carry or borrow, if the value of the second operand was less than than the first, if arithmetic overflow occurred, if the result of the operation was zero, and if the result was negative. Any branch or jump instructions following the instruction that set the PSR will use the PSR flags decide whether or not the jump or branch occurs.

The branch and jump commands enable choices to be made within the execution of programs and are the only conditional instructions in the instruction set. There are a total of sixteen mnemonics used with branches and jumps that signify a unique condition determined by the status of the PSR, set by previous instruction. These mnemonics as well as their meanings are shown in Table 1.

## Architecture

The Plava processor architecture consists of a controller that interprets and fetches instructions, and an arithmetic logic unit (ALU) which processes the instructions. This two part design enables all instructions except the memory load instruction to be executed in two clock cycles, using a two stage fetch and calculate process. Storage of the result of the calculate step are pipelined with the fetch stage.

The first stage is responsible for loading an instruction and decoding the operation. During the first clock cycle the instruction is fetched from current program counter location in memory and sets all appropriate multiplexer values in the rest of the processor.

The second stage is responsible for calculation required for the operation. In the second clock cycle values are loaded from registers and sent into the ALU to be processed. The results from the ALU are then written back during the fetch phase of the next instruction. This partial pipelining of the write back and instruction fetch stages pipelined enabling us to effectively execute 25 million instructions per second.

The only exception to the two stage cycle is the load instruction, which requires three clock cycles. This is due to the address being loaded from register, then the data at that address fetched from memory, then the data finally being written back during the following fetch phase.

### Arithmetic Logic Unit

The ALU-chip is the most complex piece of the processor and as such will be addressed individually. The ALU-chip is the place where all calculation is actually performed. It expects as inputs the values to be used in the calculation, an operation code, a function code, and a condition code; the latter three values are determined by the controller, which gets them from the instruction. For jumps and branches, the current values of the PSR is read and used in conjunction with the condition code to calculate the control flow. The output of the ALU is the result of the calculation. Certain operations will also modify the flags in the PSR register.

# MEMORY MAP

Describe the how the memory map does what it does, and the purposes of it

# GRAPHICS PROCESSING UNIT

## Overview

The graphics processing unit of the Plava platform is a bitmapped sprite based system. The system is capable of VGA output at True Color (24-bit) levels, and provides to the programmer an easy to use memory mapped interface.

The system is based loosely on the graphics implementation of the Super Nintendo Entertainment System (SNES) [1]. The basic concepts of sprite object meta-data, bitmapped sprite tables, and a palette lookup table were used. These features were expanded to use the capabilities of the Spartan-3e FPGA.

## Sprite Object Table

The sprite object table consists of 256 objects, each of which contains information on the screen coordinate of each object, vertical priority, horizontal and vertical flipping, sprite table selection, size, and palette selection. The programmer is able to manipulate these objects through an array of structures located in memory. This allows the programmer to easily perform basic sprite manipulations, such as translation, by reading and writing directly to memory.

## Bitmap Tile Table

The system contains two 128x128 pixel sprite tile tables. 16-color indexed bitmaps are split into 8x8 pixel tiles and placed on the table. Each table therefore 16x16 tiles. Each sprite object contains a tile coordinate, which is the X and Y coordinate of the upper left tile of the sprite. By using the horizontal and vertical size from the sprite object, a section of the sprite table is isolated, which forms the complete sprite.

## Rendering

The VGA standard generates 640x480 pixels at a refresh rate of 60 Hz. This provides significant time per screen refresh to place the sprites onto the screen. However, because of memory limitations, only the next scan line and the current scan line can be stored in a frame buffer. A block RAM is dedicated to the front and back scan line buffer.

At the beginning of each scan line, the front and back buffer are swapped. The back buffer then begins calculating the contents of the next scan line. Initially, the system reads from a register, called sprite priority, which determines the rendering order of sprites. When drawing a scan line, the first sprite drawn will the sprite indicated by this register. After this sprite is drawn, the GPU then draws sprites in numerical order. For example, if the priority is set to 10, then the 10th sprite will be drawn, then the 11th, and so on. This allows the programmer to prioritize the drawing of specific sprites.

As each sprite object is read from the sprite object table, the system culls sprites without the active flag and sprites which do not intersect the current scan line. If the sprite is active and does intersect, the sprite data is then passed on to the line buffer to be drawn. The line buffer calculates the address of a 1x8 pixel horizontal slice from the tile table. This is then copied into the back buffer at the appropriate horizontal offset, along with z-level priority information and the palette from the sprite object.

Rendering follows consistent rules dealing with overlapping sprites. A pixel is considered transparent if the index of the pixel is 0. If a pixel has not been updated this line, then the pixel from the sprite being rendered is automatically written to the line. If the pixel has been already been updated, then the new pixel is checked for transparency, and ignored if transparent. If non-transparent, then the z-levels of the two pixels are compared. If the new pixel has a z-level greater than the current pixel, then the new pixel is written. Otherwise if the new z-level is less than or equal to the current pixel, the pixel remains the same. This means that the sprites on the same z-level will be prioritized on a first rendered basis.

## Output Generation

As the scan line moves across the screen horizontally, the front buffer is read pixel by pixel. If the pixel is non-transparent, then the pixel is used, otherwise a separately generated background index is used. The final index is then passed into the palette lookup table, which in turn outputs the final color to the display.

As each bitmap in the sprite table is indexed, a palette lookup table is used to convert a palette index number into a 24 bit RGB color. The system uses 16 colors per palette, with 32 distinct palettes available. The palette is selected as part of the sprite object, and the palette index is selected by each pixel in the sprite bitmap.

The on-board VGA connector of the Spartan-3E starter development kit only supports 1 bit per color channel, for 3-bit color. The Plava System uses a off-board video digital to analog converter (DAC) to allow 8-bits per color channel, for 24-bit color. The DAC selected is the Texas Instruments THS8134B [2]. This chip is designed specifically for the use of video systems, including VGA; it features three matched DACs and a parallel digital interface. The circuit was built using the datasheet and a reference design by Altium [3].

# SOUND SYSTEM

Describe the sound

# USER INPUT

Introduce the Input devices (can be omitted if sub points are sufficiently detailed)

## Light Gun

Describe how it works and how to interface with the light-gun. Include a schematic of the circuitry to connect it with the system

## Super Nintendo Controller

Describe how the controller works and how to interface with it.

# PROGRAMMING

## Assembler

The assembler for the Plava platform was designed to provide programmer friendly features to increase programmer productivity and ease of use. The assembler application was implemented in Python, and runs from the command line.

The assembler was written in Python 3.1, using the PLY Lex-Yacc module [4]. PLY provides a Python version of the Unix lexing and parsing tools *lex* and *yacc.* A set of lexer rules were written using regular expressions to recognize tokens such as operations, labels, registers, and immediate values. A set of parse rules were implemented as a Context Free Grammar. A macro system is implemented to take C-style *.define* statements for macro replacement.

The system works in a three pass fashion. First, the assembly code is scanned line by line, parsing macro definitions and replacing macro instances with the replacement value. Next, lexer tokenization and grammar parsing is applied. As the assembly code is tokenized, operations are counted to calculate the addresses of labels. As the grammar is applied to the tokens, a list of operation objects is created. Finally, once parsing is complete, the list of operation objects is traversed, with each operation producing a set of machine code instructions. These instructions are then written to the output file.

The Plava assemble features *.text* and *.data* segments, allowing the programmer to define variables, arrays, and strings during coding. The labels of the data segment can be used in the same manner as other labels, allowing the programmer to refer easily to the addresses of data values.

Several additional features are provided by the assembler for ease of development. Registers can be referred to by number ($0 - $15), or by the MIPS convention (e.g. $t0, $s1, $a2, $v1). The conventions of MIPS were used during development. Saved registers ($s0-$s3) , argument registers ($a0-$a2) , the frame pointer ($fp) , and the stack pointer ($sp) are callee saved. Temporary registers ($t0-$t3), return value registers ($v0-$v1), and the return address register ($ra) are caller saved. Various pseudo-operations were provided to allow easy manipulation of the stack, stack frames, and procedure calls. Addition pseudo operations are provided for loading 16 bit immediate values, no-op, and arithmetic negation.

As a consequence of having sophisticated lexer rules written, these rules were easily converted to the lexer rules used for syntax highlighting in the Gnome *gedit* text editor. This proved to be extremely valuable during development.

To aid in debugging, the assembler also provides a command line object dump mode, modeled after the Unix *objdump* utility. Following compilation, if the appropriate command line flag is set, the assembler displays to the programmer a listing of all macros found, and a human readable table of machine code matched with the corresponding labels and assembly code.

## Demo Application – Duck Hunt

Describe the Duck Hunt design

# ACKNOWLEDGEMENTS

Texas Instruments provided development samples of the THS8134B video digital to analog converter.

# REFERENCES

[1] Anomie's Register Doc

[http://www.romhacking.net/docs/[196]regs.txt](http://www.romhacking.net/docs/%5B196%5Dregs.txt)

[2] Texas Instruments THS8134B Datasheet

<http://www.ti.com/lit/gpn/ths8134b>

[3] Altium PB-01 Audio/Video Peripheral Board Schematic

<http://wiki.altium.com/download/attachments/3080506/PB01+Audio-Video+Peripheral+Board+Schematics.pdf?version=1&modificationDate=1228268275552>

[3] PLY (Python Lex-Yacc)

http://www.dabeaz.com/ply/